Introduction

Purpose and Data Sources for NYC Shooting Location Analysis

This project analyzes shooting incidents in New York City with a focus on geographic and location-based patterns. The analysis uses two NYPD datasets: the historic dataset covering incidents from 2015 through 2024, and the 2025 year-to-date dataset providing the most recent records. Together, these data allow examination of long-term patterns and current trends in shootings across boroughs and location types.

Note The analysis was limited to the most recent 10-year period (2015-2025) to optimize file size for web-based visualization while still capturing meaningful temporal patterns.

The analysis includes descriptive summaries of borough-level shooting counts, location-based breakdowns (inside versus outside), and a Poisson regression model evaluating whether shooting counts vary by location type and victim race across boroughs. These approaches provide insight into how environmental and demographic factors relate to shooting incidents in New York City.

library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
library(knitr)
library(kableExtra)
library(broom)
library(tigris)
library(leaflet)
options(tigris_use_cache = TRUE, tigris_progress = FALSE)
census_api_key("959ba8ec2ff8f8bf41e4cafecc6ec9727219fe63", 
               install = TRUE, overwrite = TRUE) |> 
  invisible()
dat1 = read.csv("./Data folder/Shooting_Historic.csv")
dat2 = read.csv("./Data folder/Shooting_2025.csv")

# Filter historic data to 2015 onwards
dat1 = dat1 |>
  mutate(year = as.numeric(substr(OCCUR_DATE, nchar(OCCUR_DATE) - 3, nchar(OCCUR_DATE)))) |>
  filter(year >= 2015)

dat_all = bind_rows(
  dat1 |> mutate(source = "historic"),
  dat2 |> mutate(source = "y2025")
)
dat_all = dat_all |>
  mutate(
    location_type = case_when(
      LOC_OF_OCCUR_DESC == "INSIDE" ~ "Inside",
      LOC_OF_OCCUR_DESC == "OUTSIDE" ~ "Outside",
      TRUE ~ "Unknown"
    )
  )

dat1 = dat1 |>
  mutate(
    location_type = case_when(
      LOC_OF_OCCUR_DESC == "INSIDE" ~ "Inside",
      LOC_OF_OCCUR_DESC == "OUTSIDE" ~ "Outside",
      TRUE ~ "Unknown"
    )
  )
dat_all_sf = dat_all |>
  filter(!is.na(Latitude) & !is.na(Longitude)) |>
  filter(Latitude != 0 & Longitude != 0) |>
  st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326)

2. Descriptive Analysis

This section presents descriptive summaries of shooting incidents across the five boroughs and by location type. It focuses on understanding basic patterns in counts and distributions before moving into spatial visualization and regression modeling.

2a. Borough-Level Shooting Counts (2015-2024)

This subsection summarizes the total number of shooting incidents for each borough using the historic dataset from 2015 to 2024.

dat1_boro = dat1 |>
  group_by(BORO) |>
  summarise(
    n_total = n(),
    .groups = "drop"
  )

dat1_boro |>
  mutate(
    n_total = format(n_total, big.mark = ",")
  ) |>
  kable(
    col.names = c("Borough", "Total Shootings"),
    align = c("l", "r"),
    caption = "Shooting Incident Counts by Borough (2015 to 2024)"
  ) |>
  kable_styling(full_width = FALSE, position = "center")
Shooting Incident Counts by Borough (2015 to 2024)
Borough Total Shootings
BRONX 4,240
BROOKLYN 4,945
MANHATTAN 2,049
QUEENS 2,043
STATEN ISLAND 367
  • Comments Brooklyn and the Bronx have the highest total shooting counts in this period, together accounting for the majority of incidents citywide. Queens and Manhattan show moderate levels, while Staten Island has the fewest shootings. This distribution reflects substantial variation in shooting activity across boroughs.

2b. Borough-Level Shooting Counts (All Years Combined)

This subsection presents shooting incident counts for each borough using the full dataset that includes all available years through 2025.

dat_all_boro = dat_all |>
  group_by(BORO) |>
  summarise(
    n_total = n(),
    .groups = "drop"
  )

dat_all_boro |>
  mutate(
    BORO = stringr::str_to_title(stringr::str_to_lower(BORO)),
    n_total = format(n_total, big.mark = ",")
  ) |>
  kable(
    col.names = c("Borough", "Total Shootings"),
    align = c("l", "r"),
    caption = "Shooting Incident Counts by Borough (All Years Including 2025)"
  ) |>
  kable_styling(full_width = FALSE, position = "center")
Shooting Incident Counts by Borough (All Years Including 2025)
Borough Total Shootings
Bronx 4,538
Brooklyn 5,211
Manhattan 2,160
Queens 2,129
Staten Island 375
  • Comments The relative distribution of shootings across boroughs remains consistent when 2025 data is included. Brooklyn and the Bronx continue to have the highest incident counts, while Staten Island remains the lowest. The overall pattern is stable over time.

2c. Location-Based Analysis (Inside vs Outside)

This subsection examines the distribution of shooting incidents by location type (inside buildings versus outside) for the historic period.

dat1_location = dat1 |>
  filter(location_type != "Unknown") |>
  group_by(location_type) |>
  summarise(
    n_total = n(),
    .groups = "drop"
  )

dat1_location |>
  mutate(
    n_total = format(n_total, big.mark = ",")
  ) |>
  kable(
    col.names = c("Location", "Total Shootings"),
    align = c("l", "r"),
    caption = "Shooting Incidents by Location Type (2015 to 2024)"
  ) |>
  kable_styling(full_width = FALSE, position = "center")
Shooting Incidents by Location Type (2015 to 2024)
Location Total Shootings
Inside 682
Outside 3,466
  • Comments Among incidents with known location type, the majority of shootings occur outside. This pattern reflects the nature of street-level gun violence being more common than indoor incidents. However, it is important to note that a substantial number of incidents have unknown location data.

2d. Location-Based Analysis (All Years Combined)

This subsection presents the distribution of shooting incidents by location type using the full dataset.

dat_all_location = dat_all |>
  filter(location_type != "Unknown") |>
  group_by(location_type) |>
  summarise(
    n_total = n(),
    .groups = "drop"
  )

dat_all_location |>
  mutate(
    n_total = format(n_total, big.mark = ",")
  ) |>
  kable(
    col.names = c("Location", "Total Shootings"),
    align = c("l", "r"),
    caption = "Shooting Incidents by Location Type (All Years Including 2025)"
  ) |>
  kable_styling(full_width = FALSE, position = "center")
Shooting Incidents by Location Type (All Years Including 2025)
Location Total Shootings
Inside 759
Outside 4,158
  • Comments The pattern of outside shootings exceeding inside shootings remains consistent when all years are included. Outside incidents continue to represent the majority of shootings with known location type.

2e. Location-Based Analysis by Borough (All Years)

This subsection cross-tabulates shooting incidents by borough and location type for the full dataset.

dat_all_loc_boro = dat_all |>
  filter(location_type != "Unknown") |>
  group_by(BORO, location_type) |>
  summarise(
    n_total = n(),
    .groups = "drop"
  ) |>
  pivot_wider(
    names_from = location_type,
    values_from = n_total,
    values_fill = 0
  )

dat_all_loc_boro |>
  mutate(BORO = stringr::str_to_title(stringr::str_to_lower(BORO))) |>
  kable(
    col.names = c("Borough", "Inside", "Outside"),
    align = c("l", "r", "r"),
    caption = "Shooting Incidents by Borough and Location Type (All Years)"
  ) |>
  kable_styling(full_width = FALSE, position = "center")
Shooting Incidents by Borough and Location Type (All Years)
Borough Inside Outside
Bronx 251 1479
Brooklyn 277 1309
Manhattan 106 717
Queens 103 581
Staten Island 22 72
  • Comments All boroughs show more outside shootings than inside shootings among incidents with known location. The pattern holds across Brooklyn, Bronx, Queens, Manhattan, and Staten Island. Brooklyn and the Bronx have the highest counts for both location types.

3. Geographic Visualization

This section presents interactive maps that visually display geographic differences in shooting incidents across boroughs. Each point represents an individual shooting incident.

3a. Map of Total Shootings by Borough (All Years)

This map shows all shooting incidents from 2015 through 2025 as scatter points overlaid on borough boundaries.

nyc_boro = tigris::counties(state = "NY", cb = TRUE) |>
  filter(NAME == "Bronx" | NAME == "Kings" | NAME == "New York" | NAME == "Queens" | NAME == "Richmond")

boro_to_county = function(df) {
  df |>
    mutate(
      county_name = case_when(
        BORO == "MANHATTAN" ~ "New York",
        BORO == "BROOKLYN" ~ "Kings",
        BORO == "BRONX" ~ "Bronx",
        BORO == "QUEENS" ~ "Queens",
        BORO == "STATEN ISLAND" ~ "Richmond"
      )
    )
}

dat_all_boro = boro_to_county(dat_all_boro)
nyc_boro_map_all = nyc_boro |>
  left_join(dat_all_boro, by = c("NAME" = "county_name"))
mapview(
  nyc_boro_map_all,
  zcol = "n_total",
  layer.name = "Borough Totals",
  alpha.regions = 0.3
) + 
mapview(
  dat_all_sf,
  cex = 1,
  alpha = 0.5,
  col.regions = "red",
  layer.name = "All Shootings"
)
  • Comments Each red point represents an individual shooting incident from 2015 to 2025. The geographic concentration in Brooklyn and the Bronx is clearly visible, showing the spatial clustering of gun violence in these boroughs. The borough shading indicates total counts, while scatter points show the exact incident locations.

3b. Map Comparing Inside vs Outside Shootings (All Years)

This map displays both inside and outside shootings with different colors to compare their spatial distributions.

dat_inside_sf = dat_all_sf |>
  filter(location_type == "Inside")

dat_outside_sf = dat_all_sf |>
  filter(location_type == "Outside")
mapview(
  nyc_boro,
  alpha.regions = 0.1,
  layer.name = "NYC Boroughs"
) + 
mapview(
  dat_inside_sf,
  cex = 2,
  alpha = 0.6,
  col.regions = "blue",
  layer.name = "Inside (Blue)"
) +
mapview(
  dat_outside_sf,
  cex = 2,
  alpha = 0.6,
  col.regions = "orange",
  layer.name = "Outside (Orange)"
)
  • Comments This comparison map shows inside shootings in blue and outside shootings in orange. The overlay reveals that both types of shootings cluster in similar geographic areas, suggesting that neighborhood-level factors influence both indoor and outdoor gun violence. Outside shootings (orange) are more numerous and create denser clusters. Use the layer controls to toggle each type on and off for comparison.

4. Regression Analysis

This section uses Poisson regression to model shooting counts aggregated by borough, location type, and victim race. This approach allows us to examine whether shooting frequency varies by location and demographic factors.

4a. Poisson Regression Model

A Poisson regression model was fit with shooting counts as the outcome, aggregated by borough, location type, and victim race.

dat_reg = dat_all |>
  filter(location_type != "Unknown") |>
  filter(!(VIC_RACE == "UNKNOWN" | VIC_RACE == "" | is.na(VIC_RACE) | VIC_RACE == "(null)")) |>
  mutate(
    race_group = case_when(
      VIC_RACE == "BLACK" ~ "Black",
      VIC_RACE == "WHITE HISPANIC" ~ "White Hispanic",
      VIC_RACE == "BLACK HISPANIC" ~ "Black Hispanic",
      VIC_RACE == "WHITE" ~ "White",
      VIC_RACE == "ASIAN / PACIFIC ISLANDER" ~ "Asian/Pacific Islander",
      VIC_RACE == "AMERICAN INDIAN/ALASKAN NATIVE" ~ "American Indian/Alaska Native",
      TRUE ~ "Other"
    )
  )

dat_agg = dat_reg |>
  group_by(BORO, location_type, race_group) |>
  summarise(
    n_shootings = n(),
    .groups = "drop"
  ) |>
  mutate(
    location_type = factor(location_type, levels = c("Outside", "Inside")),
    race_group = factor(race_group, levels = c("Black", "White Hispanic", "Black Hispanic", 
                                                "White", "Asian/Pacific Islander", 
                                                "American Indian/Alaska Native")),
    BORO = factor(BORO)
  )
fit = glm(
  n_shootings ~ location_type + race_group + BORO,
  data = dat_agg,
  family = poisson()
)

rr_table = tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
  mutate(
    term = case_when(
      term == "(Intercept)" ~ "Intercept (Outside, Black, Bronx)",
      term == "location_typeInside" ~ "Inside vs Outside",
      term == "race_groupWhite Hispanic" ~ "White Hispanic vs Black",
      term == "race_groupBlack Hispanic" ~ "Black Hispanic vs Black",
      term == "race_groupWhite" ~ "White vs Black",
      term == "race_groupAsian/Pacific Islander" ~ "Asian/Pacific Islander vs Black",
      term == "race_groupAmerican Indian/Alaska Native" ~ "American Indian/Alaska Native vs Black",
      term == "BOROBROOKLYN" ~ "Brooklyn vs Bronx",
      term == "BOROMANHATTAN" ~ "Manhattan vs Bronx",
      term == "BOROQUEENS" ~ "Queens vs Bronx",
      term == "BOROSTATEN ISLAND" ~ "Staten Island vs Bronx",
      TRUE ~ term
    ),
    estimate = round(estimate, 2),
    conf.low = round(conf.low, 2),
    conf.high = round(conf.high, 2),
    p.value = ifelse(p.value < 0.001, "<0.001", round(p.value, 3))
  ) |>
  select(
    Predictor = term,
    `Rate Ratio` = estimate,
    `Lower 95 CI` = conf.low,
    `Upper 95 CI` = conf.high,
    `P-value` = p.value
  )

rr_table |>
  kable(
    caption = "Poisson Regression: Rate Ratios for Shooting Counts",
    align = c("l", "r", "r", "r", "r")
  ) |>
  kable_styling(full_width = FALSE, position = "center")
Poisson Regression: Rate Ratios for Shooting Counts
Predictor Rate Ratio Lower 95 CI Upper 95 CI P-value
Intercept (Outside, Black, Bronx) 967.69 917.80 1019.57 <0.001
Inside vs Outside 0.18 0.17 0.20 <0.001
White Hispanic vs Black 0.28 0.26 0.30 <0.001
Black Hispanic vs Black 0.16 0.14 0.17 <0.001
White vs Black 0.03 0.03 0.04 <0.001
Asian/Pacific Islander vs Black 0.04 0.04 0.05 <0.001
American Indian/Alaska Native vs Black 0.00 0.00 0.00 <0.001
Brooklyn vs Bronx 0.92 0.86 0.98 0.011
Manhattan vs Bronx 0.48 0.44 0.52 <0.001
Queens vs Bronx 0.39 0.36 0.43 <0.001
Staten Island vs Bronx 0.05 0.04 0.07 <0.001

4b. Interpretation and Epidemiological Insights

The Poisson regression results provide insight into factors associated with shooting counts across New York City.

Location Effect: The rate ratio for inside versus outside locations indicates the relative difference in shooting counts between indoor and outdoor settings. A rate ratio less than 1 suggests fewer shootings occur inside compared to outside, which aligns with the descriptive findings showing outdoor shootings are more common.

Race and Shooting Counts: The model examines whether shooting counts differ by victim race category. These differences reflect patterns in who is affected by gun violence across the city. Interpretation should consider that race categories may be associated with neighborhood-level factors, socioeconomic conditions, and systemic inequities rather than individual-level characteristics.

Borough Differences: The rate ratios for boroughs (compared to Bronx as reference) quantify the relative differences in shooting counts across geographic areas, adjusting for location type and victim race.

Limitations: This analysis uses aggregated count data and cannot make individual-level inferences. The model assumes shooting counts follow a Poisson distribution, which may not account for overdispersion. Important confounders such as population density, socioeconomic factors, and policing patterns are not included.

4c. Model Fit Statistics

glance(fit) |>
  select(null.deviance, deviance, df.null, df.residual, AIC, BIC) |>
  mutate(across(everything(), ~round(.x, 2))) |>
  kable(
    col.names = c("Null Deviance", "Residual Deviance", "Null DF", "Residual DF", "AIC", "BIC"),
    caption = "Model Fit Statistics",
    align = "r"
  ) |>
  kable_styling(full_width = FALSE, position = "center")
Model Fit Statistics
Null Deviance Residual Deviance Null DF Residual DF AIC BIC
11994.43 513.7 53 43 795.32 817.2
  • Comments The reduction in deviance from the null model to the fitted model indicates that the predictors explain a meaningful portion of the variation in shooting counts. However, the residual deviance should be compared to the residual degrees of freedom to assess potential overdispersion.

5. Conclusion

This analysis examined NYPD shooting incident data with a focus on geographic and location-based patterns. Key findings include:

  1. Borough-level patterns: Brooklyn and the Bronx consistently show the highest numbers of shooting incidents, both historically and when recent 2025 data is included. Staten Island has the fewest incidents across all time periods.

  2. Location type: Among incidents with known location, outside shootings are more common than inside shootings. This pattern holds across all boroughs and time periods.

  3. Geographic clustering: The interactive maps reveal that shootings cluster in specific neighborhoods within each borough, with the highest density in parts of Brooklyn and the Bronx. Both inside and outside shootings follow similar geographic patterns.

  4. Regression results: The Poisson regression model confirms that shooting counts vary significantly by location type, victim race, and borough. Outside locations have higher shooting counts than inside locations. Substantial differences exist across boroughs, with Brooklyn showing particularly high counts.

These findings contribute to understanding the epidemiology of gun violence in New York City and may inform targeted prevention strategies that consider both geographic concentration and situational factors.